When Multiwords Go Bad in Machine Translation

نویسندگان

  • Anabela Barreiro
  • Johanna Monti
  • Brigitte Orliac
  • Fernando Batista
چکیده

This paper addresses the impact of multiword translation errors in machine translation (MT). We have analysed translations of multiwords in the OpenLogos rule-based system (RBMT) and in the Google Translate statistical system (SMT) for the English-French, English-Italian, and English-Portuguese language pairs. Our study shows that, for distinct reasons, multiwords remain a problematic area for MT independently of the approach, and require adequate linguistic quality evaluation metrics founded on a systematic categorization of errors by MT expert linguists. We propose an empirically-driven taxonomy for multiwords, and highlight the need for the development of specific corpora for multiword evaluation. Finally, the paper presents the Logos approach to multiword processing, illustrating how semantico-syntactic rules contribute to multiword translation quality.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Machine Translation of Non-Contiguous Multiword Units

Non-adjacent linguistic phenomena such as non-contiguous multiwords and other phrasal units containing insertions, i.e., words that are not part of the unit, are difficult to process and remain a problem for NLP applications. Non-contiguous multiword units are common across languages and constitute some of the most important challenges to high quality machine translation. This paper presents an...

متن کامل

CLUE-Aligner: An Alignment Tool to Annotate Pairs of Paraphrastic and Translation Units

Currently available alignment tools and procedures for marking-up alignments overlook non-contiguous multiword units for being too complex within the bounds of the proposed alignment methodologies. This paper presents the CLUE-Aligner (Cross-Language Unit Elicitation Aligner), a web alignment tool designed for manual annotation of pairs of paraphrastic and translation units, representing both c...

متن کامل

Computing Transfer Score in Example-Based Machine Translation

This paper presents an idea in Example-Based Machine Translation computing the transfer score for each produced translation. When an EBMT system nds an example in the translation memory, it tries to modify the sentence in order to produce the best possible translation of the input sentence. The user of the system, however, is unable to judge the quality of the translation. This problem can be s...

متن کامل

Universal Words and their relationship to Multilinguality, Wordnet and Multiwords

In this article we address issues concerning construction of lexicon in the context of sentential knowledge representation in Universal Networking Language (UNL), an interlingua proposed in 1996 for machine translation. Lexical knowledge in UNL is in the form of Universal Words (UWs) which are concepts expressed by mostly English words disambiguated and stored in the universal words repository....

متن کامل

Detecting Hidden Multiwords in Bilingual Dictionaries

Dictionaries are a valuable source of information about multiwords. Unfortunately, only few multiwords are explicitly marked as such in dictionaries: most of them are presented without being distinguished from free combinations of words. In this paper we present a methodology for detecting hidden multiwords in bilingual dictionaries, along with their translation in another language. The methodo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013